Data-Driven Speech Animation Synthesis Focusing on Realistic Inside of the Mouth

نویسندگان

  • Masahide Kawai
  • Tomoyori Iwao
  • Daisuke Mima
  • Akinobu Maejima
  • Shigeo Morishima
چکیده

Speech animation synthesis is still a challenging topic in the field of computer graphics. Despite many challenges, representing detailed appearance of inner mouth such as nipping tongue’s tip with teeth and tongue’s back hasn’t been achieved in the resulting animation. To solve this problem, we propose a method of data-driven speech animation synthesis especially when focusing on the inside of the mouth. First, we classify inner mouth into teeth labeling opening distance of the teeth and a tongue according to phoneme information. We then insert them into existing speech animation based on opening distance of the teeth and phoneme information. Finally, we apply patch-based texture synthesis technique with a 2,213 images database created from 7 subjects to the resulting animation. By using the proposed method, we can automatically generate a speech animation with the realistic inner mouth from the existing speech animation created by previous methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Photo-Realistic Mouth Animation Based on an Asynchronous Articulatory DBN Model for Continuous Speech

This paper proposes a continuous speech driven photo realistic visual speech synthesis approach based on an articulatory dynamic Bayesian network model (AF_AVDBN) with constrained asynchrony. In the training of the AF_AVDBN model, the perceptual linear prediction (PLP) features and YUV features are extracted as acoustic and visual features respectively. Given an input speech and the trained AF_...

متن کامل

Speech-Driven Mouth Animation Based on Support Vector Regression

Visual mouth information has been proved to be very helpful for understanding the speech contents. The synthesis of the corresponding face video directly from the speech signals is strongly demanded since it can significantly reduce the amount of video information for transmission. In this paper, we present a novel statistical learning approach that learns the mappings from input voice signals ...

متن کامل

Speech-driven facial animation using a hierarchical model - Vision, Image and Signal Processing, IEE Proceedings-

A system capable of producing near video-realistic animation of a speaker given only speech inputs is presented. The audio input is a continuous speech signal, requires no phonetic labelling and is speaker-independent. The system requires only a short video training corpus of a subject speaking a list of viseme-targeted words in order to achieve convincing realistic facial synthesis. The system...

متن کامل

Image-based Talking Head: Analysis and Synthesis

In this paper, our image-based talking head system is presented, which includes two parts: analysis and synthesis. In the analysis part, a subject reading a predefined corpus is recorded first. The recorded audio-visual data is analyzed in order to create a database containing a large number of normalized mouth images and their related information. The synthesis part generates natural looking t...

متن کامل

Visual speech synthesis from 3D video

Data-driven approaches to 2D facial animation from video have achieved highly realistic results. In this paper we introduce a process for visual speech synthesis from 3D video capture to reproduce the dynamics of 3D face shape and appearance. Animation from real speech is performed by path optimisation over a graph representation of phonetically segmented captured 3D video. A novel similarity m...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JIP

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2014